Overview


iNaturalist logo

In the last decade, technology-based community science programs, such as eBird and iNaturalist, have vastly expanded our understanding of species distributions. Data collected by program participants have altered range maps and revealed new insights on species’ niches and the phenology of organisms (i.e., the timing of life history events). Concurrently, such programs also open opportunities to engage directly in the practice of collecting meaningful scientific data.

Despite the opportunities that such projects offer for scientists and the general public, the data often expose bias – both in terms of sampling and among community science participants. For example, observations are often skewed towards areas with higher human population densities, public parks, and uncommon species. Concurrently, socioeconomic status is a key indicator of participation in these programs. This biases the distribution of observations but can also offer insights into how science can be made more accessible to the public.

In this exercise, you will explore data collected by iNaturalist participants in Washington, DC in May of 2022. You will use raster and point data to explore their observations.

When I create an R Markdown file to communicate a coding process, I usually work in a .R script file and copy-and-paste the code into the R Markdown document at the end.

Grading


The points allotted for each question are provided in highlighted red bold text (e.g., [1.0]) within the question itself. When applicable, total points for a question may represent the sum of individually graded components, which are provided in red text (e.g., [1.0]).

Points may be deducted from each question’s total:

Note: The maximum deduction is the total points value for a given question

Pay careful attention to the format of your code – not following the rules above will cost you lots of little points (and some big ones) that can really add up!

In addition to points allotted per question, you must ensure that your R Markdown document runs out-of-the-box [25% off of the total grade] – in other words, the document will knit without error. Some tips for doing so:

Click the blue button below to view the functions that you may use in completing this problem set. Make sure that you know what each function does (use ?[function name] if you do not). Do not use any functions outside of this list!

In this assignment, you may use only the following R functions (Note: If you are unclear on what a given function does, use ? to view the help file!):

  • base::=
  • base::<-
  • base::$
  • base::~
  • base::/
  • base::c
  • base::file.path
  • base::library
  • base::list.files
  • base::mean
  • base::names
  • base::tolower
  • dplyr::inner_join
  • dplyr::mutate
  • dplyr::n
  • dplyr::pull
  • dplyr::summarize
  • magrittr::%>%
  • purrr::map
  • rlang::set_names
  • sf::st_area
  • sf::st_centroid
  • sf::st_crs
  • sf::st_difference
  • sf::st_join
  • sf::st_read
  • sf::st_sf
  • sf::st_transform
  • sf::st_union
  • terra::crop
  • terra::extract
  • terra::global
  • terra::mask
  • terra::rast
  • terra::vect
  • tibble::as_tibble
  • tmap::+
  • tmap::tmap_mode
  • tmap::tm_dots
  • tmap::tm_shape
  • tmap::tm_polygons
  • tmap::tm_raster
  • tmap::tmap_mode
  • units::set_units

Note: The packages dplyr, ggplot2, magrittr, readr, rlang, and tibble are all part of the tidyverse and are loaded with library(tidyverse).

Getting started


1. [0.5] Save and knit this document:

  • [0.1] Replace my name in the YAML header with yours
  • [0.1] Add the current date in the YAML header
  • [0.3] Save the .rmd file in the output folder of your project as (but replace my name with yours): problem_set_5_Evans_Brian.rmd

2. [0.5] Load the sf, tmap, and tidyverse libraries.

library(sf)
library(tmap)
library(tidyverse)

3. [0.5] Set the tmap mode to “view” for the entire document.

tmap_mode('view')

Data loading and pre-processing


4. [1.0] Modify the code below to such that it reads in the geojson shapefiles inat_2022_may, dc_surface_water, and dc_census [0.25] and assign shapes to your global environment as a single list object. In doing so:

  • [0.25] Assign the names inat, water, and census to the individual list items;
  • [0.25] Convert the crs to EPSG 5070;
  • [0.25] Set the fields of all files to lowercase.
shapes <-
  c(
    inat = 'inat_2022_may.geojson',
    water = 'dc_surface_water.geojson',
    census = 'dc_census.geojson') %>% 
  map( 
        ~ file.path('data/raw/shapefiles', .x) %>% 
          st_read() %>% 
          set_names(
            names(.) %>% 
            tolower()) %>% 
          st_transform(crs = 5070))

Note: The income variable is the median income per census tract.

5. [1.0] Using shapes$census:

  • [0.4] Combine the census tracts into a single multipolygon;
  • [0.4] Remove water from the resultant object;
  • [0.2] Assign to your global environment with the name dc_land.
dc_land_water <-
  shapes %>%
  map(~ .x %>%
        st_union() %>%
        st_sf())

dc_land <-
  dc_land_water$census %>%
  st_difference(dc_land_water$water)

rm(dc_land_water)

6. [1.0] Using list.files to list the rasters in the folder ‘data/rasters_problem_set_5’ and a purrr::map function, modify the code below:

  • [0.25] Read in the raster files;
  • [0.25] Crop the rasters to shapes$census;
  • [0.25] Convert the resultant object to a raster stack;
  • [0.25] Mask the raster stack to shapes$census.
rasters <-
  list.files("data/rasters_problem_set_5",
             full.names = TRUE) %>%
  set_names(c("canopy_cover",
              "impervious_surface")) %>%
  purrr::map(~ terra::rast(.x) %>%
               terra::crop(shapes$census) %>%
               terra::mask(shapes$census)) %>%
  terra::rast()
## Warning: [mask] CRS do not match
## Warning: [rast] CRS do not match

Note: You may see a warning that the CRS of the two raster files are not equivalent – don’t worry, they are!

Exploring the data


7. [1.0] Calculate the mean canopy cover of the land area in Washington DC.

dc_land %>%
  terra::vect() %>%
  terra::extract(
    rasters$canopy_cover,
    .,
    mean,
    na.rm = TRUE) %>%
  pull()
## [1] 17.43166

8. [1.5] Generate a tmap [0.5] of mean canopy cover by census tract [1.0].

shapes$census %>%
  terra::vect() %>%
  terra::extract(
    rasters$canopy_cover,
    .,
    mean,
    na.rm = TRUE) %>%
  inner_join(
    shapes$census %>%
    mutate(ID = row_number()), .) %>%
  tm_shape() +
  tm_polygons(col = "canopy_cover",
              alpha = .6,
              palette = "Greens") 

9. [1.5] Generate a choropleth tmap [0.5] that displays the density [0.5] of iNaturalist observations per census tract (number of observations per unit area; area unit = km^2) [0.5].

shapes$inat %>%
  st_join(shapes$census, .) %>%
  as_tibble() %>%
  summarize(n = n(),
            .by = geoid) %>%
  inner_join(shapes$census, .) %>%
  mutate(area =
           st_area(.) %>%
           units::set_units("km^2"),
           density = n / area) %>%
  tm_shape() +
  tm_polygons(
    title = 'Observations',
    style = 'kmeans',
    col = 'density',
    palette = 'viridis') 


10. [1.5] Generate a tmap that displays (with layers ordered as follows):

  • [0.25] Canopy cover;
  • [0.25] Impervious surface cover;
  • [0.50] The median income per census tract, displayed as polygons with the fill color determined by median income;
  • [0.50] The number of iNaturalist observations per census tract displayed as centroids of tracts, with the point color determined by the total number of observations in each tract.
inat_dense <-
  shapes$inat %>%
   st_join(shapes$census, .) %>%
   as_tibble() %>%
   summarize(n = n(),
            .by = geoid) %>%
   inner_join(shapes$census, .) %>% 
   mutate(area =
          st_area(.) %>%
          units::set_units("km^2"),
          density = n / area) 


# tm_basemap(
#   c('Esri.WorldTopoMap',
#     'OpenStreetMap',
#     'Esri.WorldImagery')) +

tm_shape(rasters$canopy_cover) +
tm_raster(alpha = .6,
    palette = "Greens") +

tm_shape(rasters$impervious_surface) +
tm_raster(alpha = .6,
    palette = "OrRd") +

tm_shape(shapes$census) +
tm_polygons(col = 'income',
    palette = 'viridis') +

tm_shape(inat_dense) +
tm_dots(col = 'density',
    palette = 'magma') 

Extra credit [0.5] : Use a custom color palette for each of the layers in Question 10!